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Abstract 

This paper investigates the dynamics of infectious diseases with a non-exponentially dis¬ 
tributed infectious period. This is achieved by considering a multi-stage infection model on 
networks. Using pairwise approximation with a standard closure, a number of important char¬ 
acteristics of disease dynamics are derived analytically, including the final size of an epidemic 
and a threshold for epidemic outbreaks, and it is shown how these quantities depend on disease 
characteristics, as well as the number of disease stages. Stochastic simulations of dynamics on 
networks are performed and compared to the results of pairwise models for several realistic ex¬ 
amples of infectious diseases to illustrate the role played by the number of stages in the disease 
dynamics. These results show that a higher number of disease stages results in faster epidemic 
outbreaks with a higher peak prevalence and a larger final size of the epidemic. The agreement 
between the pairwise and simulation methods is excellent in the cases we consider. 


1 Introduction 

Mathematical models of infectious diseases are known to provide an invaluable insight into the 
mechanisms driving disease invasion and spread. In many cases, to obtain the first approximation 
of the spread of a disease it is sufficient to use a version of the classical SIR model (Kermack and 
McKendrick 1927) [40]. However, major outbreaks of avian and swine influenza (Ferguson et al. 
2006) [23], SARS (Donnelly et al. 2003) [18], and more recently, ebola (Chowell and Nishiura 2014) 
ns], have highlighted the need for a more accurate description of the disease dynamics that would 
provide predictive power to be used for developing measures for disease control and prevention 
(Keeling and Rohani 2008) [38]. 

One of the major simplifying assumptions often used in mathematical models of disease dy¬ 
namics is the exponential distribution of infectious periods. Effectively, this means that the chance 
of an individual recovering during any given time period does not depend on the duration of time 
that individual has already been infected. Whilst such an assumption may provide significant 
mathematical convenience and be reasonably realistic in some situations, most often it is violated, 
and this requires the inclusion of the precise distribution of infectious periods in the model (Bailey 
1954; Hope-Simpson 1952) [HE!]. There are several methods that can be employed to explicitly 
include a non-exponential distribution, including a multi-stage approach (Anderson and May 1992; 
Cox and Miller 1965) [2,13], an integro-differential formulation (Kermack and McKendrick 1927; 
Hethcote and Tudor 1980; Keeling and Grenfell 1997) [ID] G2 36j , and a PDE-based formulation 
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akin to that for age-structured models (Anderson and May 1992) pQ . In the multi-stage frame¬ 
work, it is assumed that the infectious stage of a disease is characterised by a number K of distinct 
stages (Cox and Miller 1965; Lloyd 2000; Lloyd 2001) HUfEj 12] > with the duration of each stage 
being an independent exponentially distributed random number. Due to the fact that the sum 
of independent exponentially distributed random variables obeys a gamma distribution (Durrett 
2010) [19] . one can replace an exponential distribution with the mean infectious period I /7 by a 
gamma distribution T(K, lf(K'y)) that has the same mean infectious period I/ 7 . The so-called 
linear chain trick (Cox and Miller 1965; MacDonald 1978) (14, |45j then consists in replacing a 
single infectious stage with K identical exponentially distributed sub-stages, each having a mean 
period l/(Ivy). These multiple stages of infection can be used to represent periods of increased 
or decreased risk of transmitting the disease (Ma and Earn 2006) (Bj. The same approach can 
be extended to models with multiple classes (Keeling and Grenfell 2002; Nguyen and Rojani 2008) 
EE] EE], as well as non-exponentially distributed latency and temporary immunity periods (Blyuss 
and Kyrychko 2010; Wearing and Rojani 2005) [ 8 1 , [58]. Following the methodology of introducing 
multi-stage of infection to better represent the distribution of infectious periods, we proceed with 
dividing the infected population into K identical stages R, R, . . ., Ik to create the so-called SI K R 
model (Lloyd 2000) [42], and we denote the total infected population by / = One should 

note that K'y is now used as the transition rate between successive infectious stages in order to 
keep the average duration of infection as I/ 7 . With these notations, the SI h R model takes the 
form 

dS/dt = -PSI, 
dh/dt = (5SI - K~,I \, 
dl 2 /dt = K'yh - KjI-2, 

dI K /dt = K^/Ik- 1 - KjI k , 

dR/dt = Ik, 

where S denotes the proportion of susceptible individuals, R is the proportion of recovered or 
removed individuals, /3 is the disease transmission rate taken to be the same for all stages of 
infection, and the disease is assumed to confer a life-long immunity. The importance of including 
not just the mean infectious period, but the actual distribution of infectious periods, as achieved by 
the system ([ 1 ]) is further highlighted by the inspection of actual values of epidemiological parameters 
for several real diseases as presented in Table [T] This table illustrates that whilst the transmission 
rate and the average infectious period may vary between different diseases, in all of these cases 
the number of stages that has to be included in order to correctly represent the disease dynamics 
may also be quite high, this reinforces an earlier observation about the non-exponential nature of 
infectious period distribution. 

Whilst this method of introducing multiple stages of infection is clearly more realistic, the 
assumption of a homogeneous fully mixed population remains very important, having significant 
effects on the disease dynamics (Keeling and Rohani 2008) [38] . Although this assumption often 
provides a good approximation that helps reduce complexity of the model, in many cases it is just 
not realistic and results in erroneous conclusions about the onset and development of epidemic 
outbreaks (Bansal et al. 2007; Burr and Chowell 2008) j5,il2j. To address this issue, networks have 
been and are being used successfully to model the contact structure of the population to a high 
degree of detail (Danon et al. 2011; Keeling and Earnes 2005) [15, 35]. Typically, network models 
are parameterised with empirical data or synthetic models that can be either purely theoretical, 
e.g. homogeneous random networks or Erdos-Renyi random graphs, or obey some widely observed 
network characteristics, such as a particular degree distribution or clustering. However, with added 
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Table 1: Estimates of epidemiological parameters for different infectious diseases. 


Disease 

P 

7 -1 (days) 

Stages K 

Source(s) 

Measles 

Seasonal 

5 

20 

(Hope-Simpson 1952) [2B] 

SARS 

0.545 

5-6 

3 

(Bauch et al. 2005; Riley et al. 2003) [6l [53] 

Influenza 

1.66 

2.2 

3 

(Keeling and Rojani 2008; Cauchemez et al. 2004) [38i |10] 

Smallpox 

0.49 

8.6 

4 

(Ferguson et al. 2003; Koplan, Azizullah and Foster 1978) [23, | 


model realism comes complexity, which in the case of epidemic network models can be handled 
via mean-field models, such as pairwise models (Keeling 1999; House and Keeling 2011a) (53) [29] 
that are able to better account for the explicit nature of network links. As long as such mean-field 
models provide a good approximation to the explicit stochastic network models, they open up the 
possibility to analyticaly compute important quantities such as epidemic threshold, final epidemic 
size and so on. Thus, the explicit stochastic network simulation model and the pairwise model 
combine favourably to provide a more accurate model with some degree of analytical tractability. 

In this paper we are concurrently relaxing the assumptions of homogeneous random mixing and 
exponentially distributed infectious periods to generate a multi-stage pairwise model for the spread 
of epidemics on networks. The paper is organised as follows. The next Section contains a brief 
summary and discussion of earlier results on the properties of the SI K R model dU). In Section 3 we 
employ the framework of pairwise approximations to derive a multi-stage infection pairwise model 
and use this to derive analytical expressions for the probability of transmission of infection along an 
infected edge in a network, a threshold parameter controlling the onset of epidemic outbreaks, and 
the final size of an epidemic. In Section 4 numerical simulations of the pairwise and the full network 
models are performed using realistic parameter values from Table [T] to investigate the accuracy of 
pairwise approximation and to illustrate the role played by the number of stages in the multi-stage 
distribution in the disease dynamics. The paper concludes in Section 5 with discussion of results 
and future outlook. 

2 Dynamics of the well-mixed model 

As a first step, we consider the SI K R model ([!]), which has an implicit assumption that every 
member of the population has a sufficient level of contact so that the infection can be passed from 
any individual to any other. This is a natural extension of the basic SIR model (Kermack and 
McKendrick 1927) [40], and as such it has been well-studied in a number of papers (Lloyd 2000; 
Ma and Earn 2006; van den Driessche and Watmough 2002) [321 [331 [57 ]. 

Perhaps, one of the most important and commonly used parameters characterising the severity 
of epidemics and stability of the disease-free equilibrium is the basic reproduction number 77o defined 
as the expected number of secondary infections caused by a single typical infectious individual in 
a wholly susceptible population. The value of IZq is related to the stability of the disease-free 
equilibrium, and it is an important threshold parameter signifying that an epidemic will spread 
when 77o > 1 and die out otherwise. 

The basic reproduction number for the system ([!]) can be found as follows (Hyman and Stanley 
1999; Ma and Earn 2006; van den Driessche and Watmough 2002) [32, [44], 57] 

n 0 = ( 2 ) 
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Figure 1: A comparison of infection dynamics for a one-, three- and five-stage SI K R models with 
data from (Keeling and Rojani 2008) [38) ■ Each curve represents the sum of all in the model. 
Adding extra stages causes epidemics to occur earlier and result in a higher peak of epidemic, 
although TZq and the final size are identical for each curve. The parameter values are N = 1000, 
/3 = 1.66/day, 7 = 0.4545/day. 

which depends on the average duration of infection I /7 but is independent of the number of stages 
in the model. A practically important characteristic of an epidemic outbreak is the final epidemic 
size (Keeling and Rojani 2008) [38]. Since the total population size is closed with no inflow or 
outflow of individuals, i.e. S(t ) + h(t) + / 2 (f) + ... + Ixit) + R(t ) = 1, at the end of an outbreak 
we have a burn-out of the epidemic, i.e. I\ = I 2 = ■ ■ ■ = Ik = 0, and hence 5(oo) + R( 00 ) = 1 and 
R( 00 ) = 1 — 5 ( 00 ). This results in the following implicit equation for the final size of an epidemic 
that determines the proportion of individuals not affected by the disease (Anderson and May 1992; 
Diekmann and Heesterbeek 2000) [TJ [T7] 

R{ 00 ) = 1 - e- n ° R{oo \ (3) 

which coincides with the final epidemic size in the original SIR model (Kermack and McKendrick 
1927) [40]. Ma and Earn (2006) f44] have recently discussed various aspects related to the derivation 
and validity of formula ([3|), and Andreasen (2011) [3] has studied the effects of population hetero¬ 
geneity on the size of epidemic. A major implication of the above results is the fact that inclusion 
of possibly more realistic gamma distribution of infectious periods does not alter the threshold of 
an epidemic outbreak, nor does it affect the final epidemic size. One should note, however, that 
when a stochastic version of the SI K R model is considered, the number of stages influences the 
distribution of final epidemic sizes, while the average final size remains the same (Black and Ross 
2015; House et al. 2013) [7], 31). We see that in Fig. Q] the three curves show that considering 
multi-stage infectious periods has a significant effect of the dynamics of the epidemic. In order 
to get a better understanding of the distinction in the dynamics of SIR and SI h R models, it is 
therefore instructive to look at the development of epidemics. In the standard SIR. model, an 
outbreak can only take place if IZq > 1, and at the initial stage, the number of infected individuals 
can be approximated as I(t) « I(0)exp(At), where the growth rate is A = 7 ( 77.0 — 1)- In the case 
of a multi-stage SI K R model, however, the basic reproduction number TZq does not depend on 
the number of stages, hence, it cannot by itself be used to determine the exponential growth rate 
during an early stage of an outbreak. For this model Wearing et al. (2005) [58] have derived the 
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Figure 2: Proportion of infected individuals during a boarding school influenza outbreak with 
/3 = 1.66/day and 7 = 0.4545/day (Keeling and Rojani 2008) [38] . In each plot the solid black line 
is the numerical solution of the model (P) with an appropriate number of stages, and the dashed line 
is the exponential growth curve with the rate determined by equation Q shown on a logarithmic 
scale, (a) One-stage model with A ~ 1.2055, (b) Two-stage model with A « 1.4035, (c) Three-stage 
model with A ~ 1.4762, (d) Five-stage model with A ~ 1.534. In each case note that in the earliest 
stages the exponential approximation is virtually identical to the infection curve. 


following relation between the basic reproduction number IZq and the initial growth rate A 

A 


77-n — 


7 1 - 


K 7 


+ 1 


-K 


(4) 


Figure [5] illustrates early dynamics of epidemic outbreaks for different numbers of stages; in each 
case an exponential curve was fitted, which provides an accurate approximation for the initial 
growth rate of the infection as determined by equation Q. This figure shows the effects of the 
gamma distribution on the early growth rate, peak prevalence and overall time frame of the disease, 
and it also suggests that the largest effect of the gamma distribution on the disease dynamics occurs 
during intermediate stages of disease progression. 

Besides the basic reproduction number, final epidemic size and the initial growth rate of an 
epidemic, another practically important characteristic of epidemic outbreaks is the peak prevalence 
defined as the maximum number or proportion of infected individuals that can be achieved during 
an outbreak. In the case of an SIR model, the peak prevalence can be found as follows (Feng 2007; 
House and Keeling 2011b) [22l [30 j 

Imax = 1 ~ vy [1 + hr(77o)]. 

/Co 

Feng (2007) [22] has recently considered an SEIR model with gamma distributed infectious period 
and derived an expression for the peak of a weighted average of infectious compartments. This 
result gives some intuition into how the number of stages affects peak prevalence, but it does not 
provide a closed form expression for the actual peak prevalence in an SI K R model. Numerical 
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results in Fig. [2] suggest that for the same average infectious period, the overall peak prevalence 
increases with the number of stages included in the model. 

3 Network dynamics with multiple stages 

Inclusion of multiple stages of infection in the SI K R model gives a more realistic representation of 
the infectious period, but the model still has certain limitations due to its underlying assumptions. 
In the model (H|) it is assumed that the disease is not fatal, and that transitions between different 
infected classes, or stages of infection, take place at exactly the same rate Kj. Another major 
assumption behind model JT]) is that the population is well-mixed, i.e. each individual has equal 
chances of encountering and transmitting a disease to any other individual in a population. Whilst 
this may be appropriate in the case of outbreaks in small closed communities, for a large number 
of communicable diseases, such as SARS, influenza and most sexually transmitted infections, this 
assumption is a gross simplification of the actual dynamics as it overlooks spatial variability, as 
well as the complexities of a network structure for infections that are transmitted through direct 
close contact between individuals (House and Keeling 2011a; Keeling and Eames 2005) { 29l 135] . 

Modelling complex contact patterns explicitly via networks has had a profound effect on math¬ 
ematical epidemiology. This new modelling framework has led to a myriad of models ranging from 
exact to mean-held and simulation models (Pastor-Satorras et al. 2014; Danon et al. 2011; Keeling 
and Eames 2005; Newman 2003; Boccaletti et al. 2006)[52j |T5j 35, H9j [9|. The many degrees of 
freedom in modelling offered by networks however, often comes at the price of increasing levels of 
complexity, where models can be challenging to evaluate analytically and sometimes even numer¬ 
ically. Nevertheless, many valuable paradigm models have been developed which have furthered 
our understanding of the impact of contact heterogeneity, preferential mixing and clustering on 
the outbreak threshold and other epidemic descriptors. A particularly useful way of capturing 
epidemic dynamics on networks is by using the pairwise model (Keeling 1999) [34] . This model is 
based around deriving in a hierarchical way evolution equations for the expected number of nodes, 
edges, triples and so on. A closure is then employed that curtails the dependence on ever higher 
order moments. Its premise is simple and quite intuitive, although it can be also shown rigorously 
(Taylor et al. 2012) [56] that pairwise models before closure are exact. The basic idea of the model 
is to recognise that changes at node level depend on the status of the neighbours and thus involves 
edges, e.g. the rate of change in the number of infectious nodes is proportional to the number of 
S — I links in the network. Similarly, the number of edges can change due to pair interactions 
and transitions but also due to interactions induced from outside the edge, e.g. the number of 
S — S links decrease proportionally to the number of S — S — I triples, where infection from the I 
node destroys the fully susceptible pair. This framework has been used and extended extensively, 
to asymmetric (Sharkey et al. 2006) [55j and weighted networks (Rattana et al. 2013) [53] for 
example, and has proved to be a valuable framework. 

3.1 Pairwise model 

As a first step in the analysis of dynamics of multi-stage epidemics on networks, we re-formulate the 
SI K R model using the framework of pairwise equations , which allows one to analyse the expected 
values for the number of nodes and links of each type as a function of time (Keeling 1999; House 
and Keeling 2011a; Taylor et al. 2012) [Ml [29] 56]. The particular strength of pairwise models 
lies in their analytical tractability and the fact that they provide a more accurate description than 
well-mixed ODE models but do not go to the level of full individual-based stochastic simulations 
(House and Keeling 2011a) [29]. In this formalism of pairwise models, notations [X], [XX] and 
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[XYZ\ are used to denote the expected numbers of individuals in state X, the expected number of 
links between nodes of type X and Y and the expected number of triples of the form X — Y — Z, 
respectively. More precisely, given a ‘frozen’ network with nodes labels X , Y or Z and subscripts 
indicating nodes i,j and k then 

N N N 

[X] = Y^Xi, [XY] = Y XiY jgij , [XYZ\ = Y x i Y j z k9ij9jk, 

*=1 .7=1 i,j,k=l 

where X, Y, Z E {S, I\, I 2 , ■ ■ ■, Ik, R}, and G = (gij)i,j=i,2,...,N is the adjacency matrix of the 
network such that gu = 0, gij = gji and gtj = gji = 1 if nodes i and j are connected and zero 
otherwise. Moreover, A, returns one if node i is in state X and zero otherwise. The average degree 
of each node is denoted by n, and the number of nodes in the network by N. The new pairwise 
SI K R model with a gamma distributed infectious period can then be written as follows, 

[s] = -rEl 1 m, 

[A]=tE£ 1 m-K-ylh], 


[ij} = A 7 [/,_,] - Kj[Ij ], for j = 2,3,..., K. 
[SS] = -2r'Z« 1 [SSI i \, 


[5/r] = -(r + K^[Sh\ + r [^=1 [SSI*] - Eh [Wi]) , 


[Slj] 


-(T + K'y)[SI j ] + K'y[SI j - 1 ]-rEhi[IiSI j ], for j = 2,3,...,K. 


where r = (3/n is the transmission rate per link. Since we consider a closed population, this 
immediately implies [£]+JT =1 K[Ii] + [R\ = N. The system (|5|) is not closed as additional equations 
describing the dynamics of triples are needed. To eliminate this dependence on higher moments 
and close the system, we will use the classical moment closure approximation which assumes that 
short loops and clusters are excluded from the network and that there is no correlation between 
nodes with a common neighbour (Keeling 1999) [34]. 


w d-: 1 ’? , for i-UK, 


n 


[5] 


for iJ = 1:Km 


( 6 ) 


n [5] 

Applying these closures to the system ([S]) makes it a self-consistent system of (2 K + 2) equations. 


3.2 The probability of transmission across an infected edge 

When one considers a stochastic network-based simulation, an important quantity characterising 
the disease dynamics is the probability f of disease transmission across a given S—I link. In a simple 
one-stage model, where both infection and recovery are assumed to be distributed exponentially, the 
probability of no infection event occurring during time t is given by po(t) = e~ Tt ] hence 1 — po(t) 
is the probability that infection does take place over the same time period. Averaging this via 
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integration for all possible recovery times yields the probability that the susceptible node becomes 
infected. In a standard SIR model with exponentially distributed infectious and recovery period, 
this probability is, therefore (Danon et al. 2011; Diekmann, De Jong and Metz 1998) [ T5l I16| 


f = 1 — 


7 


r + 7 r + 7 


(7) 


In the case of an SI K R model, the duration of infection is described by the density function of the 
appropriate gamma distribution 


g(x;K,l/(K~f)) = 


1 


(A'-l)! 


{Kj) k x K ~ 1 e~ K ' yx . 


( 8 ) 


The implication of this fact is the following result for the probability of transmission across an edge. 
Lemmal. For the stochastic SI K R model with the period of infection following the gamma distri¬ 
bution m, the probability of disease transmission across a given S — I link is given by 


f = 1 — 


K'y 


t + K 7 

The proof of this lemma is given in Appendix A. 

By rewriting expression @ in the form 


K 


(9) 


f = 1 — 


K'Y + T — T 


K 


= 1 - 1 - 




K 


t + K'y J \ t + Kj , 

and using the fact that e x = lim n ^, 00 (1 + x/n) n , it follows that 


lim t{K) = 1 — exp ( — — ). (10) 

K^-oo \ 7 J 

Figure [3] illustrates the dependence of f on the number of stages K, as well as a limiting behaviour 
as K —> oo. This figure illustrates that while f is growing with the increasing number of stages K , 
it eventually saturates at a level determined by Eq. (flOl) . In fact, this saturation at higher K is 
observed not only in the probability of transmission, but also in the peak prevalence rate, as well 
as in the early growth rate. When compared to an exponential distribution, it is these substantial 
changes in f observed for smaller values of K that explain the changes in the profile of the infection 
curves. As will be shown later, f is a very important quantity that controls various properties of 
epidemic dynamics, such as the threshold for an outbreak and the final size of an epidemic. 


3.3 7^o-like threshold parameter 

Unlike epidemic models in well-mixed populations, defining an appropriate TZq for pairwise models is 
more challenging. This is in part due to the difficulty of identifying the typical infectious individual. 
In order to derive a value for IZq, one needs to consider and correctly account for the correlation 
between susceptible and infected nodes and measure TZq when this has stabilised, see Keeling (1999) 
[33] and Eames (2008) [i2Tj. Intuitively, this means that the epidemic is allowed to spread in order 
to become established in the network. This allows for typical’ infectious individuals to develop and 
for IZq to be measured. In large networks this regime can still be considered to be close to or only 
a small perturbation away from the disease-free steady state. 









Figure 3: Dependence of the probability of transmission across an S — I edge f on the number of 
stages I\ as given by Eq. @ for different mean infectious periods with r = 0.166. Crosses, circles 
and diamonds correspond to integer values of K on each curve. 


We now proceed to derive an 7 ^ 0 -hke threshold parameter TZ which can be used to predict when 
the epidemics occur, by allowing outbreaks only when 1Z > 1 (Rattana et al. 2013) [53]. To this 
end, we linearise the model d5|) with a classic closure ([ 6 l) near the disease-free equilibrium which has 
the form [S'] = N, [SS] = nN, and all other quantities being zero. As in the standard approach, the 
condition necessary for the initial growth of an epidemic is that the dominant eigenvalue \ max of 
the resulting characteristic polynomial is real and positive, and a threshold parameter is obtained as 
a condition on system parameters that ensure the stability change, i.e. A max = 0. In the Appendix 
B it is shown that the characteristic equation for eigenvalues A of the linearised system near the 
disease-free steady state for a A'-stage model © is given by 


A 2 (A + A 7 ) 


I< 


(r + K 7 + X) K — r(n — 1) 


(r + K'y + A) 


K — l 


+ EI-i\Kx) K - 1 (t + K^ + \) 


K-i/ 


\i —1 


= 0 . 


In Appendix B we prove that the largest eigenvalue A satisfying this equation goes through zero, 
he. A max = 0, when 

1Z:= {n — l)f = 1. (11) 

This defines a new 7£o-like threshold parameter with f introduced in ([9]). A closer inspection 
shows that this parameter 7Z describes the probability of spreading the disease across a given link 
multiplied by the likely number of susceptible contacts of the individual assuming that they are 
the earliest people being infected, which perfectly agrees with the standard definition of TZq as the 
average number of secondary cases produced in a fully susceptible population by a single typical 
infectious individual. Whilst 7Z does not quantify the early growth rate of an epidemic, through its 
dependence on f and K it allows one to better predict epidemic outbreaks in the case of a more 
realistic gamma distribution of infectious period, where in the case of an exponential distribution 
with the same mean infectious period. We also note that whilst in the implementation of the classic 
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SI K R model there was no effect of changing the number of stages on Rq, this more sophisticated 
model results in a threshold which implicitly accounts for multi-stage infectious periods. 


3.4 The final size of an epidemic 

Since the pairwise model (JSJ) is a network representation of an epidemic with life-long immunity and 
fixed population size, eventually an epidemic will burn out, leaving some proportion of the popu¬ 
lation unaffected and still susceptible to the disease. Since [ii](oo) = [J 2 K 00 ) = ... = [/a] ( 00 ) = 0, 
the final size of an epidemic is given by the proportion of people in the removed class, i.e. 
[i?]oo = N — [S]oo- As we saw earlier for the SI K R model (0) in a well-mixed population, the 
final size of a single epidemic does not change with the number of stages. However, the same 
conclusion no longer holds for the pairwise model (JSJ) with the closure (|6|) . in which case we have 
the following result. 


Theorem 1 For a single epidemic outbreak in a closed population with a vanishingly small starting 
level of infection, the final size of an epidemic in the pairwise model fW with the classical closure 
(01 is given by 

Roc = 1 - (1 - f + f6 ) n , 


where 


and t is defined in m- 


9 = (1 — f + Of) 


71—1 


( 12 ) 

(13) 


Proof. To prove this statement we extend the methodology developed by Keeling (1999) 
one-stage epidemics. We first introduce some new variables and parameters 


for 


n- 1 „ E£i [Sli] „ M 

CL — -, r — --1 (_r — 


n 


[s]“ 


[5]“’ 


L = 


[55] 


exp(n[5] 1 / n )[5] 2a ' 


and 


p.-m for i-12 K 

1 ~ [5]“ ° * - 


From © and the easily derived function 

it follows that these new variables satisfy the following system of equations 

F = -tF - K'jPk + arff)-F, 


M = 


[SS\ 

[5]“’ 


(14) 


G = K 7 P K: 

L = - aT ~[ 5f F ’ 


( 15 ) 


M = tMF. 
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Since [/«](0) = [I,;](oo) = 0 for any i = 1,, K, this implies F(0) = F( oo) = 0. Integrating the first 
equation in (fT3D gives 


poo poo poo r o cl 

F(oo) — F(0) = 0 = — r / Fdt — K 7 / Pxdt + aT / Fdt 

Jo Jo Jo h>J 

= — [ln(M(oo)) - ln(M(0))] - [G(oo) - G(0)] - [L(oo) - L(0)] 


(16) 


= — [ln(M(oo)) — ln(M( 0 ))] — f [L(oo) — L( 0 )] , 
where in the last step we have used the fact that G( 0 ) = 0 and the relation 

<?(°o) = = (f - l)[L(°o) - L( 0 )], 

l^Joo 

derived in Appendix C together with another relation 


2 a 
00 


[S5]oo = 

N a ~ l / n 

Substituting these two relations into Eq. (fl6l) and using the fact that [5] (0) = N yields 


(17) 


(18) 


0 = nN 1 / n - n[S]^ n - 


n[S] 


]\fa- 1/n 


52 - - nN 1 / 1 


= 1 - 



Introducing the fraction of susceptible individuals as 5 qo = [<S']oo/-^ ) the above equation can be 
rewritten as follows, 

l-SU n = f(S^-l), 

or alternatively, as another implicit equation for Soo 

Soo = (1 - t + T0) n , where 0 = S^. (19) 

Since [/]j(oo) = 0, introducing R 0Q = [Rloo/N yields the desired expression for the final size of an 
epidemic 

Roc = 1- Soo = l-(l-f + f0) n . 

Using the fact that 6 = S^, equation (11911 can be rewritten in the form 


0 V“ = (! -f + fd) n ^ 9 = (l-f + fO) 71 - 1 , 


where in the last step we have used the relation a = (n — 1 )/n. This completes the proof. ■ 


We note that our result in Theorem [T] is functionally identical to the result achieved by Keeling 
(1999) [33], and it generalises the final size equation by replacing r/(r + 7 ) with the parameter t. 
In the case K = 1 these two values are equivalent, thus we have perfect agreement with the existing 
theory. Equivalent relations have also been derived by Newman (2002) [48] using percolation theory. 
Those results were later corrected and shown to hold in all cases where the distribution of infectious 
periods is degenerative (Kenah and Robins 2007) [ 39] . An equivalent relation has been derived for a 
static configuration network model with an arbitrary degree distribution (Miller 2012) [47]. Figured] 
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Figure 4: Dependence of the final size of an epidemic (1121) on the per-link transmission rate r and 
the number of stages I\ in the pairwise model © with 7 = 0.4545 for different average node degrees, 
(a) n = 2. (b ) n = 4. (c) n = 7. (d) n = 10. (e) n = 4, r = 0.3 (solid), r = 0.6 (dashed), r = 0.9 
(dotted), (f) n = 10, r = 0.09 (solid), r = 0.18 (dashed), r = 0.27 (dotted). Circles correspond to 
integer values of K on each curve. The case n = 2 is used solely for illustrative purposes, as the 
resulting networks would be disconnected and thus inappropriate for direct comparison to results 
from the pairwise model. 


12 



















































Figure 5: Numerical solution of the pairwise SI K R model Q with different average infectious 
periods and a different number of stages, but the same final size due to identical transmissibility f. 
Parameter values are r = 0.2, n = 10, 7 = 1 and K = 1 (solid) and r = 0.2, n = 10, 7 rj 1.06 and 
K = 3 (dashed). The solution curves for the overall infected population show a radically different 
intermediate behaviour, but with f = 1/6 in both cases, they have the same final epidemic size. 


illustrates Theorem [T] by showing how the final size of an epidemic on a network depends on the 
number of infectious stages and, hence, the shape of the distribution of infectious period, which 
makes it different from earlier analytical results for a well-mixed population (Ma and Earn 2006) 
|44j . This suggests that inclusion of a more realistic population structure has effect not only on 
the intermediate disease dynamics, but also on the final proportion of the population that will be 
affected by the disease. Furthermore, this Figure suggests that for the same mean infectious period, 
the final size of an epidemic is increasing with the increasing number of stages K. One should note 
that the number of stages K has the largest effect on the final size of an epidemic for sufficiently 
low values of K, and then this dependence saturates. As expected, the average node degree n 
plays an important role, with the minimum value of t or K required for an epidemic outbreak 
decreasing with increasing n in perfect agreement with an earlier result in Eq. (HU). Stochastic 
simulations (not shown) demonstrate excellent agreement with the results in Fig. [Tj especially for 
denser networks. The conclusions of Theorem [T] highlight the importance of collecting accurate and 
reliable data about the infectivity profile of a disease for predicting the scale of an outbreak. 

It is worth noting that whilst the final size depends on the distribution of the infectious period, 
this dependence is not necessarily unique. This means that two different distributions of infected 
periods can provide the same transmissibility f, resulting in the same final epidemic size in ac¬ 
cordance with Theorem Q] but having different intermediate dynamics of infection, as illustrated 
in Fig. [5j The consequence of this observation is that although the epidemic threshold and final 
epidemic size can both be accurately computed using an estimate for the transmissibility of the 
disease (Newman 2002) [48], it is not sufficient to correctly predict the dynamics of the infection 
spreading process over time, which can be done with our model. 
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Figure 6: Simulation of a SARS outbreak using data from Table [Tj with n = 10 and N = 1000. 
Lines correspond to a numerical solution of the pairwise model (|5|) (K = 1 solid line, K = 3 dashed 
line), while symbols represent the average of 250 serious outbreaks (K = 1 filled circles, K = 3 
triangles), (a) Homogeneous network, (b) Erdos-Renyi random graph. 

4 Impact of a realistic infectious period distribution: case studies 

In order to test the accuracy of the pairwise model ([5]) and to illustrate the role played by the 
distribution of infectious period, we consider the examples of outbreaks of several diseases mentioned 
in Table |T| in a population that is initially fully susceptible. We concentrate on two common 
and fairly simple network structures, namely, homogeneous and Erdos-Renyi networks (Newman 
2010) [50], with stochastic simulations being performed using a Gillespie algorithm (Gillespie 1977; 
Chen and Bokka 2005) [26, ITT] . We restrict our attention to these network types as we have a 
homogeneous pairwise model and we would not expect it to work well for other networks. Following 
the derivation of the pairwise model, the per-link transmission rate is taken to be r = /3/n, and 
we now perform the comparison of an average of 250 stochastic outcomes of serious epidemics on 
a homogeneous and Erdos-Renyi networks against the results of a pairwise model with gamma 
distributed infectious period. To highlight the impact of including a realistic distribution for the 
infectious period, we compare the results of simulations with realistic values of parameters from 
Tabic m against those obtained using an exponentially distributed infectious period as assumed in 
many existing models. 

Severe Acute Respiratory Syndrome (SARS) is a viral disease characterised by flu-like symp¬ 
toms which is primarily spread through close contacts with infected individuals that makes it a 
perfect candidate for deducing some basic parameters from epidemiological observations. Figure [6] 
illustrates the comparison of SARS dynamics on homogeneous and Erdos-Renyi networks with a 
pairwise approximation. One can observe that the effects of including more stages in the disease 
model on intermediate behaviour are similar to those seen earlier, namely, that gamma distribution 
of infectious period shortens the overall duration an epidemic and increases peak prevalence. It is 
also worth noting that, in accordance with Theorem [H the final size of an epidemic also increases 
with K. 

The second example we consider is smallpox, a viral disease that has been eradicated globally 
except for two stocks kept in the secure labs and being used for further research. Several papers 
have modelled the effectiveness of smallpox when used as a bio-weapon, as well strategies for 
its containment during possible outbreaks (Ferguson et al. 2003; Kaplan Craft and Wein 2002; 
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Figure 7: Simulation of a smallpox outbreak using data from Table [U with n = 10 and N = 1000. 
Lines correspond to a numerical solution of the pairwise model (0 (K = 1 solid line, K = 4 dashed 
line), while symbols represent the average of 250 serious outbreaks (K = 1 filled circles, K = 4 
triangles), (a) Homogeneous network, (b) Erdos-Renyi random graph. 


Meltzer et al. 2001) |2li [33l 36]. Due to a profound impact smallpox has had on a human 
population over several centuries, an extensive and quite accurate data has been collected about 
its transmission. Smallpox is spread through a contact with the mucus of an infected individual, 
which implies that a close contact is essential for a successful disease transmission. In Fig. [7] 
we show the simulations of smallpox outbreaks on homogeneous and Erdos-Renyi networks using 
parameter values from Table |T] compared to results of the numerical solution of the corresponding 
pairwise model ([5]). The first important observation that the higher severity of epidemics outbreaks 
as suggested by these data makes the pairwise model more accurate, as expected. The effect of 
including the realistic distribution of infectious period is more pronounced in this case as compared 
to the SARS simulations, which can be attributed to the fact that smallpox model includes four 
stages of infection, while the SARS model had only three stages. Despite changes in the intermediate 
behaviour for smallpox being more pronounced compared to SARS, the final size of an epidemic 
as given by the pairwise model only increases from 96.34% to 97.89%, which is consistent with an 
earlier observation that the effect of increasing the number of stages on the final epidemic size is 
less noticeable for higher K. 

Figure [8] illustrates the comparison of a pairwise model ([5]) with the closure ([6]) and a stochastic 
simulation on the example of influenza data with different number of stages of infection. Comparison 
of figures (a) and (b) shows that the heterogeneity introduced by the degree distribution makes 
the pairwise model less accurate due to the fact that this model only takes into account the mean 
degree n. This suggests that whilst our model is very helpful for understanding general features 
of multi-stage disease dynamics on networks, it has to be extended further to deal effectively with 
wider and more realistic node degree distributions. One should note that the effects of increasing 
the number of stages on peak prevalence and the duration of epidemics reduce for higher values of 
K, as can be observed by comparing the minor changes between temporal profiles of the three- and 
five-stage influenza epidemics presented as shown in Fig. [8j 
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Figure 8: Simulation of an influenza outbreak using data from Table [T] with n = 10 and N = 1000. 
Lines correspond to a numerical solution of the pairwise model Q (K = 1 solid line, K = 3 dashed 
line, K = 5 dotted line), while symbols represent the average of 250 serious outbreaks (K = 1 filled 
circles, I\ = 3 triangles, K = 5 squares), (a) Homogeneous network, (b) Erdos-Renyi random 
graph. 

5 Discussion 

In this paper we have analysed the behaviour of multi-stage infections with particular emphasis 
on contact networks. Unlike the well-mixed models, for which the number of stages modifies the 
temporary profile of an outbreak but does not affect the final epidemic size or the condition for 
disease outbreak, in the case of disease spread on a network, the number of stages, i.e. the precise 
distribution of infectious period, plays a much more prominent role. 

In order to make analytical progress with the analysis of disease dynamics on networks, we 
have employed the framework of pairwise approximation. This has allowed us to determine the 
probability of disease transmission across a network edge and to find an T^o-hke threshold that 
controls the onset of epidemics. We have also derived an analytical expression for the final size of 
an epidemic, which is in perfect agreement with the final size computed using percolation theory 
(Newman 2002; Kenah and Robins 2007) ji'8'l [39] . and therefore, our findings can be considered 
exact in the limit of infinite population size. All of these quantities depend not only on the basic 
disease characteristics, such as, the transmission rate and the average infectious period, but also on 
the distribution of the infectious period as represented by the number of stages in the model. The 
importance of this result lies in the fact that unlike earlier studies of multi-stage models in well- 
mixed populations (Anderson and Watson 1980; Ma and Earn 2006) [2]j33], for the same average 
duration of the infection period, the final epidemic size is not constant but increasing with the 
number of stages. We also observe that the threshold at which point a major epidemic is expected 
depends on the number of infectious stages, with epidemics becoming more likely as the number 
of stages is increased. This dependence emerges due to the higher resolution of our model which 
allows us to identify new links between model ingredients and disease dynamics. Similar results 
have been noted in related studies, for example, in models concerned with contact tracing (Eames 
and Keeling 2003) [20] and models of coupled disease and information transmission on networks 
(Funk et al. 2009) [25] . 

Numerical simulations of epidemic outbreak for several different multi-stage infections demon¬ 
strate that while the pairwise model provides a reasonably good approximation of the network 
dynamics, the agreement with stochastic simulations is affected by clustering and local network 
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structure that can induce correlations in the dynamics of different nodes, as well as the inhomo¬ 
geneity in the node degree distribution, as should be expected from the fact that the pairwise 
closure only depends on the average node degree. 

There are several directions in which the approach presented in this paper could be extended. 
These include the analysis of SIS and SEIR models, as well as inclusion of multiple stages for both 
the latent and infected classes (Nguyen and Rojani 2008) [51]. Whilst inclusion of latent classes 
may have no effect on the basic reproduction number or the final size distribution in a homoge¬ 
neous model (Black and Ross 2015; House et al. 2013) [7] 3T], whether the same would be true 
in a network model remains to be seen. Another interesting and important problem would be 
the consideration of network dynamics for epidemic models with temporary immunity (Blyuss and 
Kyrychko 2010) [8]. Allowing the level of infectiousness of different nodes to vary depending on the 
stage of infection they belong to would result in even more realistic models of multi-stage diseases 
on networks. One of the challenging but practically important generalisations of the present frame¬ 
work would be an extension of a pairwise model that would account for heterogeneity in node degree 
distribution (House and Keeling 2011a) (29). This would provide deterministic models potentially 
amenable to analytical treatment that would more accurately represent stochastic disease dynamics. 
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Appendix A 

In this Appendix we prove an expression ([9]) for the probability of transmission across a given link. 

For an arbitrary number of stages and transition/recovery parameter Kj, the distribution of the 

infectious period is gamma distributed, and hence we consider here the density function originally 


stated in l]5|) 


g(x; K, l/(iTy)) = ( K ]_ 1 ), (Kl) K x K l e I<1X . 


g(x\K, l/(iDy)) 


Since the probability of infection taking place for a given S — / link during time t is given by 
1 — e~ rt , the probability of transmission across this link in a Lf-stage is given by 




( 20 ) 
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where the final equality is obtained by noting that the first integral is simply the integral of the 
gamma distribution function over M + , and, hence, is equal to one. Integration by parts yields a 
recursive relation 


r °° x K - 1 e~^ +K ^ x dx= K ~ l 


T + if 7 J 0 

which is valid for any integer K > 1, and this then implies 


c K-2 e -(r+Ky)x dx ^ 


r x «-l c -(r+K l)xdx _ (K- 1)1 

Jo CU -(r + K^- 


Substituting this expression into Eq. (1201) yields 

, (K 7 ) k (AT — 1)! 

(K — 1)! (r + K^) K 


(K 7 ) 


K 


(r + K^) K 


Appendix B 

Linearisation of the pairwise model Q with the closure ([U]) at the disease-free equilibrium yields 
the stability condition for eigenvalues A as a (2A' + 2) x (2A' + 2) matrix. It is useful to first consider 
it in a block form as follows, 

A B 
C D 

where C is a zero (A' + 1) x (K + 1) matrix, and the matrix A is lower-diagonal, and therefore, 
its determinant is the product of the diagonal terms. Hence, the characteristic equation can be 
written as 


r(n 


A 2 (A + A" 7 ) x 


1) — A '7 — r — A 
Kj 

0 


r(n — 1 ) 

—A '7 — r — A 0 
A '7 


r(n — 1 ) 
0 


= 0 


0 


o 0 

0 A '7 — A '7 — t — A 


This matrix can now be reduced to a series of lower-diagonal matrices to give the following general 
form of the characteristic equation 


0 = A 2 (A + K^) k 


(r(n 


1) — A '7 — t — A)(—A '7 — t — X) K 1 


/K -1 


= A 2 (A + A' 7 ) k 
— r(n — 1 ) 


+ r(n - 1 ) (- 1 )*- 4 (A' 7 )^(-A '7 - r - Ay 1 J 

{(r + A 7 + X) K 

K -1 

(r + A'7 + A)^" 1 + (A'7) A ^(r + Kj + A)* -1 


2—1 
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It immediately follows that the above equation has roots of A = 0, A = —K 7 , and the other K roots 
are determined by the roots of the expression in curly brackets. Since an epidemic outbreak occurs 
when the disease-free equilibrium becomes unstable, one has to identify conditions on parameters 
when the stability of the disease-free steady state changes, i.e. where A = 0. Substituting A = 0 
into the expression in curly brackets yields 


0= (t + K^) K + r(n 


1 ) 


(r + K^) k 1 


K -1 

i=l 


= (t + Kj) k — (n — 1) (r + K'y) I< — (Kj) k 
This relation can be recast as 

(Kl) K 


1 = (n - 1 ) 1 - 


(r + K j) K 

which gives the desired expression of 1Z = (n — l)f in Eq. (HID . 


= (n- l)f, 


Appendix C 

To prove relation (fI7|) . we consider the time derivatives of the functions Pi = for i = 1,2,... ,K, 
which can be found from the pairwise model (0): 

Pi = ~{t + if 7) A + TfllgpF, 

Pi = -(r + Ky)Pi + /T-yiVi, i = 2,3,..., K. 


We also remind the reader of the functions G and L and equations for their dynamics 


G = IM 


G = K^P k , L = 


[55] 


f [S'S'] j-, 

L = — arbor F- 


[S]° A, - - exp(n[5]l/n) [S] 2a ~ [5] 

Integrating the equation for Pi and using the fact that [SPi](0) = [S/i](oo) = 0, gives 

[SS] 


[5] 


-Fdt 


poo poo pc 

0 = / P\dt = — (t + Kj) / P\dt + ar 
Jo Jo Jo 

poo 

= —(r + K'y) / Pidt — [L(oo) — L(0)]. 

Jo 

In a similar way, integrating the equation for P 2 yields 

poo poo poo 

0 = / P 2 dt = — (t + K'y) / P 2 dt + P '7 / Pidt, 
Jo Jo Jo 


which can be rewritten as 
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00 t + K'y f°° 

Pidt = / Pidt. 

0 a 7 7 o 


Proceeding the the same way, one obtains 

ro ° t + K'y f°° 


' Pidt = ——- / Pj+idt, * = 2,3,..., K - 1. 

0 K 1 Jo 
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K —1 roo 


Going through all stages of infections, we find 

On the other hand, integrating equation for G and using G(0) = 0 gives 


w 

Jo 


Pxdt. 


G( oo) - G(0) = = Kl 
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Pxdt 
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°°p Kdt = 1 ^ 


Kl [fl]So ' 


Combining the last two expressions, we obtain 

(r + I<-i) K - 1 [Sfl] t 
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Jo 


P\dt = 


(K'y) K [5]“, ’ 
and substituting this result into Eq. (12TT) gives the final relation (fT71) : 

[SR\c 


[5]Sc 


= (r- l)[E(oo) -L(0)]. 


( 22 ) 


In order to prove relation (|18l) . we examine the ratio [SS]/[S], whose dynamics is governed by 
the following equation 

d[SS\ = _ r (n r 2)[5S]E^i[5/ i ] 


dt [5] n [5] 

Separating variables and integrating this equation gives 
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(23) 


Rather than compute the integral in the right-hand side of the above equation, we use the first 
equation of the pairwise model ([5]) . which can be written as 

,e£i m 


i d 

[S]dt 
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Integrating this equation gives 


f 

Jo 


1 

]SJ 


Eh m 

[S] 


d[S] = —r 


dt 


Using this expression to replace an integral in (1231) gives 
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Substituting [S]o = N and [SS]o = nN, this formula can be rewritten as 

n —2 
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or alternatively, 
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Multiplying both sides by [S]oo and using the definition a = (n — l)/n, we obtain 

= -wiibw — M 

which gives the desired relation (1181) . 
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